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Abstract 



o 



In the Density Matrix Renormalization Group (DMRG) algorithm [T|, Hamiltonian symmetries play an important role. 
Using symmetries, the matrix representation of the Hamiltonian can be blocked. Diagonalizing each matrix block is 
more efficient than diagonalizing the original matrix. This paper explains how the the DMRG++ codeJ2i] has been 
extended to handle the non-local SU(2) symmetry in a model independent way. Improvements in CPU times compared 
to runs with only local symmetries are discussed for the one-orbital Hubbard model, and for a two-orbital Hubbard 
model for iron-based superconductors. The computational bottleneck of the algorithm and the use of shared memory 
parallelization are also addressed. 
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PROGRAM SUMMARY 

Manuscript title: Implementation of the SU(2) 
Hamiltonian Symmetry for the DMRG Algorithm 
Author: Gonzalo Alvarez 
Program title: DMRG+-h 
Licensing provisions: See file LICENSE. 
Source code: |http : / /www . ornl . gov/-gzl/| 

'dmrgPlusPlus/ 

Programming language: C++ 

Computer(s) for which the program has been 
designed: PC 

Operating system(s) for which the program has 

been designed: multiplatform, tested on Linux 

RAM required to execute with typical data: 1GB 

(256MB is enough to run included test) 
Has the code been vectorized or parallelized?: Yes 
Number of processors used: 1 to 8 with MPI, 2 to 4 
cores with pthreads 

Keywords: density-matrix renormalization group, dmrg, 
strongly correlated electrons, generic programming 
PACS: 71.10.Fd 71.27.+a 78.67.Hc 

CPC Library Classification: 23 Statistical Physics 
and Thermodynamics 

External routines/ libraries used: BLAS and LA- 
PACK 

CPC Program Library subprograms used: None. 
Nature of problem: Strongly correlated electrons 
systems, display a broad range of important phenomena, 
and their study is a major area of research in condensed 
matter physics. In this context, model Hamiltonians 
are used to simulate the relevant interactions of a given 
compound, and the relevant degrees of freedom. These 



studies rely on the use of tight-binding lattice models 
that consider electron localization, where states on one 
site can be labeled by spin and orbital degrees of freedom. 
The calculation of properties from these Hamiltonians is a 
computational intensive problem, since the Hilbert space 
over which these Hamiltonians act grows exponentially 
with the number of sites on the lattice. 
Solution method: The DMRG is a numerical 
variational technique to study quantum many body 
Hamiltonians. For one-dimensional and quasi one- 
dimensional systems, the DMRG is able to truncate, with 
bounded errors and in a general and efficient way, the 
underlying Hilbert space to a constant size, making the 
problem tractable. 
Running time: Varies 



1. Introduction 

In the DMRG algorithm[T] and other diagonalization- 
based methods, Hamiltonian symmetries play an impor- 
tant role. An operator 5 is a Hamiltonian symmetry if 
it commutes with the Hamiltonian, i. e., if [H, S] = 0. If 
S\^Pi) = sM, and S\ij2) = S2\i^2), then (^/-i |i?|V'2) = 
provided that si ^ S2- In words, H cannot "connect" 
states with different symmetries. The matrix representa- 
tion of H is then block diagonal, and diagonalizing each 
matrix block is more efficient than diagonalizing the orig- 
inal matrix. 

Reference [2] introduced DMRG++, a generic imple- 
mentation of the DMRG algorithm. There it was shown 
how to take advantage of local symmetries in a generic 
way, i. e., symmetries S, such that S = '^.^Si, where 5*^ 
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acts only on site i. In this paper the DMRG++ code is 
extended to handle the non-local SU(2) symmetry. 

Many Hamiltonians for strongly correlated electronic 
systems possess this symmetry, since they conserve the full 
spin. For example, the Heisenberg model with any spin, 
the Hubbard model[31 3] for any filling with one or multi- 
ple orbitals, and the t-J model^lH]- This is true as long as 
there are no external magnetic fields. The implementation 
of the SU(2) symmetry is involved, particularly if done in a 
generic way, but once implemented it provides substantial 
performance improvements to each of these models, e. g., 
the Hubbard model with one orbital runs four times faster 
for m > 400, as we will show. All this is achieved without 
introducing any approximations. 

Due to the wide applicability to various models, and 
the performance improvement that this symmetry brings, 
it is studied in detail in this paper, and is implemented in 
the accompanying DMRG-f-f code, which can be found at 
|ittp: //www. ornl.gov/~gzl/dmrgPlusPlus/ . Section [2] 
describes the implementation details of the SU(2) sym- 
metry for the Hilbert space basis in a model independent 
way. The work on Hilbert space operators is described 
in section [3j including performance improvements by us- 
ing reduced operators with the help of the Wigner-Eckart 
theorem. The performance bottleneck of the code is also 
analyzed, and shared memory parallelization is introduced 
for the performance critical parts of the code. 

In section |4] the method is applied first to the 
Hubbard model and then to a model for iron-based 
superconductors [7j. These are new materials whose super- 
conducting pairing mechanism, like in the cuprates, ap- 
pears to be of electronic origin. 

Finally, a summary is presented. The appendices con- 
tain a few derivations used in the text, as well as some 
documentation to be able to run the code. 

The problem discussed here was treated originally by 
McCuUoch et ai, in References [HllSl- Comparison to their 
results is provided. 

2. Hilbert Space Basis 

2.1. Basis on a Single Site 

Consider the usual[in] SU(2) operators , S~ = 
(5+)^, S'^ and 5*2 = ^{S+S- + S-S+) + {S')^. In ah 
physical cases these are spin operators-the SU(2) sym- 
metry is actually a full spin symmetry in the absence of 
magnetic fields-but this does not concern us at this point. 
We consider that the basis is diagonal in and S^' , i. e., 
S^\a) = j{j + l)\a), and S^\a) —m\a). In the code we work 
with J = 2j instead of j, and with m = m+j instead of m, 
because j and rh are always non- negative integers. Since 
it is standard notation, we will use (j, m) in the paper; the 
bijective mapping between (j, to) and (J, to) allows us to 
use (J, to) in the code. 

We consider that q is the quantum number associated 
with some local operator Q (in the case studies it will 



be the "total number of electrons", N^, operator). We 
will consider Hamiltonians that conserve 5^, S^, and, Q. 
Q can actually be formed by more than one local opera- 
tor, using the effective symmetry procedure described in 
Ref. [2]. However, Q should not include 5^, that is treated 
explicitly instead. In general, j, m and q, (or equivalently 
J, m, q) does not completely determine the states of the 
basis. 

In addition to j, to, and q, we now introduce a fourth 

quantum number, flavor or /, that will be useful in our 

implementation of the SU(2) symmetry for DMRG, but / 

will not necessarily be conserved. The following definition 

applies only to states that are eigenstates of S'^, i. e., that 

have a well defined j quantum number. We define the 
/ 

following relation \a) ~ \b), if Bp > such that either 
{S+)P\a) = ??pj,m|6) or (S+)P\b) = 77pj,m|a) holds, where 
Vp,j,m = U.lZl~^ 9j,m-x if P > and ryo.i.m = 1; and 
gj^m = v'jXj + 1) — rn{m + 1). That two states have the 

/ . 

same flavor (i. e., that \a) w \b)) immediately implies that 
they have the same j value (i. e., that ja — jb, where the 
notation ja refers to the quantum number j of the state 

/ 

I a), and likewise for 6). The relation « is an equivalence 
relation that defines an equivalence class [|a)]/ for each 

element \a) e V. We assign a different non- negative integer 
to each equivalence class, and call this number, the flavor 
of that state. 

States with the same / and j belong to the same irre- 
ducible representation of SU(2). That states have the same 
j does not, hy itself, imply that they belong to the same 
matrix representing SU(2). For example, in the Hilbert 
space of one site with spin 1/2 electrons and a single or- 
bital, the empty state and the doubly occupied state have 
both j = 0, but they do not belong to the same (one- 
dimensional) matrix. In other words, these states are not 
connected by 5+. 

Two states have the same triplet j , m and /, if and only 
if they are equal (proof in Appendix [A|) . In other words, 
J, TO and / completely and uniquely determine the states 
of the basis. 

2.2. Basis on Multiple Sites: Outer Products 

Consider two vectors spaces with bases Vi and V2, re- 
spectively. Assume that the states in these bases are eigen- 
states of both and . Consider the vector space cre- 
ated by the outer product, V3 = Vi (8) V2. Let 5*+ : V3 — >■ 
V3, be such that ~ + 5^ (where the subindices 1 
and 2 indicate that acts only on Vi and 5^ acts only on 
V2), We define S"^ : V3 — V3 and Q : V3 — V3 in the same 
way, S- = {S+)\ and = ^{S+S- + S- S+) + {S')^. 
How can we construct a basis of this outer product whose 
states are also eigenstates of S'^ and S*^? (One immedi- 
ately notes that the states \a) (g) \b), with \a) e Vi, and 
1 6) € V2, are not necessarily eigenstates of 5^.) The most 
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general solution is 

\c) =J2GcM+bNM)®\b), (1) 

a, 6 

where A^i is the number of states in Vi. In the case of 

S*^ we have an ansatz for G in terms of Clebsh-Gordon 

coefhcients (see, e.g., Ref. PU|). 

Before proceeding to create the basis, we need to explain 

how to assign quantum numbers to the outer product of 

states. It is true that Q\a) \b) = {qa + qb)\a) \b) , and 

that S"=|a) (g) |6) = {nia + mb)\a) (g) \b). But \a) (g) |6) is 

not necessarily an eigenvector of 5^, as mentioned before. 

Therefore, it does not have a well defined flavor either. We 

now extend the definition of flavor for these states in the 

following way: \a) (g \b) w \a') (g \b') if and only if all these 

equalities hold: ja = ja', fa = fa>, and qa = qa'; jb = jb' , 

f 

fb = fb', and qb = qv ■ Again, ^ is an equivalence relation, 
and we define equivalence classes, and assign flavors as 
different non-negative integers to each equivalence class. 
In the few cases where |a) (g |6) is an eigenvector of S"^ , 
this new definition of / is equivalent to the previous one. 
Then, |a) g) |6) is assigned the flavor 

fa<g>b ^fa + fbFi + {qa + qbQi)FiF2 + 
+ {ja+JbJl)FiF2QlQ2, 

where /a < -Fi V|a) & Vi, qa < Qi, Ja < Ji, and likewise 
for V2. The rationale is similar to the one-site case; states 
with the same flavor belong to the same matrix representa- 
tion of SU{2). In Eq. ([!]), the pairs |a)(g |6) that contribute 
to a given state |c) all have the same flavor fa^bj which in 
turn becomes the flavor of state |c). 

Each pair of statetj^ (|a), \b)), with \a) G Vi, and \b) G 
V2, will contribute to one or more states |c) of V3. We 
first classify the pair a + bNi in the following way. We 
calculate all the allowed j,m that ja, rua and jb, nib give 
rise to. Then, we assign the pair a + bNi to each one of 
these Sj^rn,q=qc+qh subspaccs. After classifying all pairs we 
end up with a set of allowed j, m values, and there is one 
and only one subspace Sj^m,q for each one of those j,m 
values. The pairwise intersection of these subspaces is not 
necessarily empty, because one pair of states a + bNi may 
contribute to more than one state c in Eq. ([!]). 

Each subspace Sj^rn,q is represented by an object of class 
JmSubspace. We now need to determine how many basis 
states c for V3 are to be created, and what the correspond- 
ing factors G are. For these tasks, we run the loop given 
in listing [T] 

Listing 1: Loop that each subspace Sj^m.q of the outer product runs 
to determine the factors G of Eq. |TJ. 

size_t f lavorSaved = f lavorIndices_ [0] ; 
flavors. . push_back (flavorIndices_ [0] ) ; 
size_t counter=0; 



'^In the code, these pairs are denoted by a single number, a + bNi. 



for (size_t k=0 ; k< indi ce s_ . s ize ( ) ; k++ ) { 
if ( f lavor Indices _ [k] != f lavor Saved ) { 
flavors_.push_back(flavorIndices_[k]); 
counter ++ ; 

flavorSaved = f 1 avor Indi ce s _ [k] ; 

} 

// G ( f fs et + count er , indi ces_ [p erm [k] ) = 
// = values_ [perm [k] ] 

if (heavy_) factors . set ( indices. [perm [k] ] , 
offset + counter, values_ [perm [k] ] ) ; 

} 



In this loop indices_ contains the states a + bNi for this 
particular Sj_rn,q, and f lavorIndices_ the flavor of each 
a + bNi state. For each pair a + bNl we have computed a 
vector of values, that contains the Clebsch-Gordan coef- 
ficients {jaTTT'ajb'm'bljrn) . The states of the outer product, 
V3, are labeled here by of f set+counter, where offset is 
the number of states created by previous subspaces and 
counter is the number of states created by this subspace. 
Note that f lavorlndices is ordered to simplify the al- 
gorithm, which gives rise to a permutation perm. This 
loop does two main things: (i) it sets the flavor of each 
c, which is simply the flavor of the a + bNi values that 
form part of Eq. ([T]), and (ii) it sets G(off set + counter, 
indices, [perm [k] ) = values, [perm [k] ] , as explained 
before. Finally, flavors can be reassigned new numbers 
in the basis V3. 

Now we have created a completely (i. e., in j, m, and q) 
ordered basis for V3 = Vi g) V2 composed of eigenstates of 
5^ (and and Q). It is useful to be able to "disable" the 
SU(2) symmetry, which is done by just taking G in Eq. ([T]) 
to be the identity, i.e., Gc.a+bNi — Sc,a+bNi- We also 
need a permutation P^^ to account for effective symmetry 
ordering^. Then Eq. ([I]) becomes 

|c) = ^Gpl2(c),a+6Ari|a) g) (3) 
a,b 

When the SU(2) symmetry is "enabled", G is non-trivial 
and P^^ is the identity, and vice-versa. In the DMRG 
procedure, three types of outer products will appear, and 
there will be three factors G and three permutations P at 
each DMRG step. 

The subspaces Sj^m.q for a given outer product space 
can be heavy or light, and this is denoted by the boolean 
heavy, in listing [T] When adding a new site to the system 
or to the environment, the subspaces are always heavy. 
When forming the superblock (by combining system and 
environment) the subspaces are heavy if j — jtarget o,nd 
q = qtarget, and hght otherwise, where jtarget and qtarget 
are the j and q values of the ground state to be considered 
by the DMRG algorithm. Heavy subspaces compute the 
factors G, light subspaces compute only the offsets. This 
is done for performance reasons; the factors G are only 
computed when needed. 

2.3. Change of Basis 

For the DMRG basis transformation the first order of 
business is to calculate the density matrix for system and 
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environment. If we label the states with \ j, m, /) we get 



their product is: 



Pjimifi:j[rn{f[ — ^ '/'jiwii /i :i2m2/2 /i yamS/a • (4) 

One roadblock here is that does not necessarily con- 
serve or 5^. To solve this problem, McCuUoch et al. 
successfully proposed[5] to use the SU(2) invariant reduced 
density matrix, 



S[ji,mi] 



X! ^jimifi 

j2,ni2,/2 



J2"l2 



/2^Jl"ll/(;j2m2/2 ) 



(5) 



instead of Eq. Q , and modify the DMRG truncation pro- 
cedure accordingly. 

The DMRG truncation procedure with pf^^ is as fol- 
lows. We diagonalize , and consider its eigenvectors 
ordered in increasing eigenvalue order. Let m be a 
fixed number that corresponds to the number of states in 
V{S) that are to be kept. If m > ^V{S), then W remains 
unchanged. But if m < #V(5), then W is truncated 
by discarding all states above m, and thus W becomes 
a rectangular matrix of size m x ^V{S). The basis of 
V{S) is transformed by applying the (possibly truncated) 
linear transformation . Operators are transformed in 
the usual way (i/^"™ ''^'')c.^o.' = {W'^)-l^{H^)^,yW^, . 
This procedure is repeated for the environment block. 

The transformed state W\a) has the same flavor as \a) 
(see Appendix [b]). If \j,m,f) is to be discarded, then we 
need to be sure to discard all states \j, to', /) for all to', else 
the remaining basis will not preserve the SU(2) symmetry. 
Alternatively, if \j, to, /) is to be discarded but \j, to' 7^ 
TO,/) is not, then |j, to, /) is kept. 



3. Operators and Optimizations 

3.1. Product of Operators 

As mentioned before, three types of outer products need 
be considered: (i) for the outer product of system and a 
newly added site(s), (ii) for the outer product of environ- 
ment and a newly added site(s) and (iii) for the outer 
product of system and environment. For the first two the 
corresponding bases are stored in objects of class Dmrg- 
BasisWithOperators. For the outer product of system and 
environment, the outer product is done on-the-fly only be- 
cause of memory storage reasons. If and are both 
in the system their product is: 



a,b,a' 

'-^pS(c'),a'+fcJV3' 



(6) 



where Sa — (/)"° , na is the number of electrons in state a, 
and / = — 1 if A and B anticommute or / = 1 if they com- 
mute. If is in the system and B^ is in the environment 



[A^B^], 



E 

G 



r^SE 



i^aA^ iBJ^^Ij 



SE 



(7) 



3.2. Wigner-Eckart Theorem and Reduced Operators 

The sums in Eq. (|6| and Eq. ([t]) can be performed in a 
faster wayfSI thanks to the Wigner-Eckart theorem. If op- 
erator A transforms as the representation of SU(2) labeled 



by J,M, then {ffm'\AUfjm) 
where 

{ff\\A'\\fj)^-l—x 



E Ci-/l,Jffm'\Ai,\fjm). 



(8) 



,M,9 



Since all operators that appear in constructing the Hamil- 
tonian (for example, in the case of the Hubbard model, 
and S^, in the case of the Heisenberg model) transform 
as some representation of SU(2), then these formulas can 
always be applied. 

To "reduce", for example, Eq. ([6]), we will first write 



Gc,a+bNi — C^mf,rnJ',mb^fa»bJa 



(9) 



where fa(S)b is given in Eq. ([2]), and then we will gather the 
sums over to, M, to' together. 

We use throughout the notation \a) = Ifaja^ia)- Let 
us assume that we have calculated the reduced operators 
ifijiW^^Wfaja) using definition Eq. 0, and similarly for 
ifa'ja'WB^Wfiji). We assume that A° and B^ are both 
in the system, that they commute (and then / = 1) or 
anticommute (and then / = —1), that Sa = (/)"° as be- 
fore, that A^ transforms as the irreducible representation 
of SU(2) labeled by Ja and Ma, and that B^ transforms 
as the irreducible representation of SU(2) labeled by Jb 
and Mb- 

We replace {A^)a,i = C^lf^^,„„(/;Ji|l^^ll/aJa), and the 
equivalent for {B^)i^a' into Eq. We obtain an ex- 

pression for {A^B^)c,c' in terms of {jifi\\A^\\jafa) and 
ifa'ja'WB^Wfdi). Then we calculate {fc'.]c'\\{AS BS)\\fJ,) 
again using Eq. ([s]) in terms of {A^ B^)c,c' , and replace 
{A^B^)c^c' by the expression we obtained before. The end 
result is 



aR,bR,a' n,lfi,JA,JB 



(10) 



X Sf.^bfJf^'^J.' {flil\W\\faja){fa'ja'\\B''\\fljl)~Sa, 

where 



E 

'jl^JAja Qja'^'^B ,jl 



mciTria ^rrib m^i ,m^f ,mb 



(11) 



C: 



and an represents a sum over and ja but not over ma, 
and likewise for the other indices with subscript R. Note 
that the factor depends on an, bn, a' n, Iji, Ja, and 
Jb- 

We assume now that is an operator in the environ- 
ment. Then a similar treatment of Eq. ([7| yields: 



-SE 



a.R,bH,a' R,b' R,J a,Jb 
X ifa'ja' I \A^\\faja) {fb'W 1 1 1 1 /bjb) , 



where 



E 



^ mc-,Taci^mh m^i ,m^t 



m^l ,MA,ma "T-b' ^A'/b ^ftT-iy ■ 



(12) 



(13) 



In the code, the class ReducedOperators keeps track of 
the reduced operators, and the class Su2Reduced calculates 



Eq. (10 1 and Eq. (12 1. This results in a substantial speed- 
up. 

3.3. Shared Memory Parallelization 

The most time consuming part of the DMRG method 
applied to strongly correlated electronic models is the com- 
putation of Hamiltonian connections between system and 



environment. These connections take the form c. 



for 



the Hubbard model, and S^S", SfS^ for the Heisenberg 
model, and are generically represented by Eq. ([7| or its re- 
duced form as explained before. There are a few of these 
connections in the case of the one-orbital Hubbard model 
on a one dimensional chain. There are a few dozen in the 
case of the two-orbital Hubbard model for iron-based su- 
perconductors on a ladder. Then, these connections can 
be parallelized using, for example, pi/ireadaj^ and the ac- 
celeration brought about by this procedure depends on the 
model, as the results of the next section show. 



4. Case Studies 

4.1. One-orbital Hubbard Hamiltonian 

The one-orbital Hubbard model is given by: 



(14) 



Symmetry 


m 


Energy 


CPU 


Local 


226 


-76.751582 


5332 


Local 


468 


-76.751733 


86681 


Local 


716 


-91.751739 


320911 


SU(2) 


226 


-76.751582 


1103 


SU(2) 


468 


-76.751733 


7564 


SU(2) 


716 


-76.751739 


36640 


SU(2) j=5 


226 


-74.527742 


1574 


SU(2) j=5 


468 


-74.565375 


7188 


SU(2) j=5 


716 


-74.570932 


20364 



Table 1: Results for the Hubbard model with U = 1, Via = —0.5, 
and t = 1 on a 60-site chain. Column 2 contains the m total states 
kept in each case (this is called D in Ref . [8] ) . Energies are in column 
3. A factor of U N/2 = 1 X 30/2 = 15 has been added to all energies to 
compare with Ref. [8]. CPU times in seconds are in the last column. 
All rows but the last three refer to the ground-state with j = 0. The 
last three rows are for the lowest eigenstate with j = 5. 



M 


SU(2) 1 proc 


SU(2) 2 procs 


Local 1 proc 


100 


42 


41 


67 


200 


160 


136 


319 


300 


335 


290 


808 


400 


544 


485 


1602 


800 


3020 


2526 


>2 hours 



Table 2: Times in seconds to run the one-orbital Hubbard model on 
32 sites at half filling, with U = t = 1. Runs done with 2 processors 
used shared memory parallelization with pthreads. 



These results are shown in Table [T] for j = and for j = 5. 
The infinite algorithm used m as given in the table, fol- 
lowed by one full sweep with the same m. 

Having validated these results Table [2] gives additional 
CPU times for the Hubbard model on 16 sites. In all 
cases "Local" denotes the symmetries = + and 
Sz = n^ — n^, whereas "SU(2)" denotes the symmetries n^, 
Sz and s^. 



4.2. Spin 1/2 Heisenberg Model 

This model is given by the Hamiltonian, ^j^- JijSi ■ Sj, 
and has full spin symmetry. In this case, and using a 32- 
site chain with J.y- = 1 only between nearest neighbors, the 
SU(2) symmetry yields a speed-up factor roughly between 
5 to 10, depending on m. However, the shared memory 
parallelization performs poorly, because this model has few 
connections between system and environment blocks. 



This model has the SU(2) symmetry if we define 5+ = 

J2i4t'^tl> '^'^ = J2ii4t'^it ~ C4C4), and S"^ as usual from 
these operators and their transpose conjugates. 

We start by reproducing results published in Ref. [5] 
with U — 1, Via- = —0.5, tij = 1 between nearest neigh- 
bors, and zero elsewhere, on a 60-site chain at half filling. 



■^Pthreads or POSIX threads is a standardized C language threads 
programming interface, specified by the IEEE POSIX standard. 



4.3. Hamiltonian of Iron- Based Superconductors 

In early 2008, high-temperature superconductivity was 
discovered [11 j in the iron pnictides. Except for the 
cuprates, the iron-based superconductors now have the 
highest superconducting critical temperature Tc of any 
material} 12] . Iron-based superconductors contain conduct- 
ing layers of iron and arsenic. As in the cuprate supercon- 
ductors, in the pnictides there is also evidence that the 
superconductivity is not mediated by the electron-phonon 
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interactioii[T3], but appears to be of electronic origin in- 
stead. 

A tight-binding two-orbital Hubbard model for the iron 
pnictides has been proposed [3 |T3]. This model's kinetic 
energy is given by 



K 



2,Q,7,7',(T 



(15) 



where 



x+y 





-t2 



-t2 







-h 

-ta +t4 
+ti -ta 



(16) 



The interaction is: 

Hint — Uq ^ ni^a,t^i,a,l + 



Ui rii^xni^y + U2 Si^x ■ Si,y 

i i 



(17) 



where n,- 



4 a o-Ci.a,5 and i = y, t and a = a. 
With this definition, C/q = U,Ui^U' - J/2, U2 = -2J 
and C/3 = — J. Moreover, usually U' — U — 2 J. 

This model has SU(2) symmetry if we define = 



Ei,^4t7C47' = J2i,^i4t'(''iti - 4i-f(^iii)^ and 52 as 
usual from these operators and their transpose conjugates. 
The sum over 7 is a sum over the two orbitals, a and 6 or 
and 1. In this model, the efficiency achieved by the use of 
the SU(2) symmetry is modest. This can be seen, for ex- 
ample, in Fig. [1] by comparing open circles with squares. 
In no case was the gain found to be larger than a factor 
of 1.5, and in most cases it was only about 20% to 30% 
depending on m and on the number of lattices sites. 

However, the possibility of working with a given total 
spin ground state facilitates the study of the nature of 
ground states. For example, using the SU(2) symmetry it 
is easier to determine if the ground state is a singlet or a 
triplet. Without the help of the full spin symmetry one 
would have to run with various Sz target states and infer 
from them which one has the lowest energy. 

Using the SU(2) symmetry, the CPU times for this 
model, which is implemented in class FeBasedSc, are given 
in Fig. [1] The model is expressed on a 2-leg ladder with 
parameters [H] ti = 0.058, t2 = 0.2196, ta = 0.20828, and 
t4 = 0.079. The figure shows the run with a single core and 
with two cores, parallelized via pthreads. For m >= 300, 
CPU times are cut by almost a factor of 2, the theoreti- 
cal maximum, because this model, being formulated on a 
ladder, has many connections, making the shared paral- 
lelization efficient. 

We end this section on a technical note. In this model 
the real-space basis on a single site has two states that 
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Figure 1: CPU times in seconds divided by 10, for the model given 
by Eqs. l |15)17| l, running on a single eore with full spin symmetry 
(squares), and with two cores and full spin symmetry (filled circles). 
The two-core run was done with shared memory parallelization via 
pthreads. For comparison, the open circles are runs with one core 
and without the SU(2) symmetry. All runs were carried out on a 
2x4 ladder, with fixed m = 100 for the infinite algorithm, and with 
a full finite sweep with the indicated m. 



and |9) = c|^c|^|0). In DMRG++, real-space basis states 
are coded using a binary number representation, the bit 
X indicates if there's an electron with internal degree of 
freedom, x = ^ + aNo, where No is the number of orbitals, 
7 is the orbital number (0 for a and 1 for 6), and a is 
the spin (0 for t and 1 for |). For example, c|^c|j, |0) has 
binary number 110 or 6. 

States 1 6) and |9) are reinterpreted as l/-\/2(|6)-|-|9)) and 
l/-\/2(|6) — 1 9)), respectively. This reinterpretation occurs 
when calculating operators, such as cj^^, in this real-space 
basis, and allows a binary number representation of states 
to still be used in this case, even when the original states 
were not eigenstates of 5^. 



5. Summary 

By making use of the full spin symmetry to those mod- 
els that possess it, the DMRG procedure runs faster. For 
the one-orbital Hubbard model on a one-dimensional lat- 
tice, the speed-up factors were approximately 4 on a 32- 
site lattice, and approximately 5 to 10 on a 60-site lattice. 
All these factors depend on m, as detailed in the tables. 
The speed-up factor for the two-orbital Hubbard model for 
iron-based superconductors (FeBasedSc) on a 2-leg ladder 
was modest, and never exceeded 1.5. 

The efficiency gained by using the SU(2) symmetry is 
due to the smaller size of the Hamiltonian matrix blocks 
that need to be diagonalized. This effect is countered by 
the overhead imposed by performing basis transformations 
using the factors described in Eq. ([T]). However, by em- 
ploying the Wigner-Eckart theorem and using reduced fac- 
tors and operators, it is possible to bring down the cost 
of these transformations significantly. The overall effect 
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is the decrease in CPU times mentioned in the previous 
paragraph. 

Additionahy, shared memory parallehzation was used 
to parallehze the calculation of Hamiltonian connections 
between system and environment. The success of this 
method depends on the model, and is most effective when 
there are many connections. For the FeBasedSc model 
running with 2 cores the speed-up almost reached the the- 
oretical maximum of a factor of 2. 

Strongly correlated electronic models for iron-based su- 
perconductors (implemented in the FeBasedSc DMRG-I--)- 
class) is a topic of intense study in condensed matter. 
Of particular interest is the origin and mechanism of the 
pairing in these superconductors. The DMRG algorithm 
provides an accurate way of extracting information from 
the models in this context (for a recent paper, see, e.g.^ 
Ref. [ID). 

DMRG++ is a free and open source implementation 
of the DMRG algorithm. It emphasizes generic program- 
ming using C++ templates, friendly user-interface, and as 
few software dependencies as possible. DMRG++ tries to 
make writing new models and geometries easy and fast by 
using a generic DMRG engine. 
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A. Two states have the same triplet j, m and /, if 
and only if they are equal. 

Let \a) and \b) be two states with the same j, m, and 
/. Without loss of generality we can assume that there 
3p > such that {S+Y\ 

^) ~ Vp,j,m\b)- Then, because \a) 
and |6) have the same and eigenvalue, p has to be 
zero, implying that |a) — ?7oj,m 

\b) = 1 6). The reciprocal 
holds because a given state has unique values for j, m, and 
/. The uniqueness of the first two is trivial. Flavor is also 
unique in a given basis, since a state cannot belong to two 
different equivalence classes. 

B. The reduced DMRG Transformation Conserves 
Flavor 

Here we prove that W\j,m,f) has well defined fiavor. 
Without loss of generality assume that (5+)P|j,m,/) ~ 
f?pj",m|j, "1 + p,f). Since p conserves j, m, then W 



does too, and W\j,m,f) = X)/' ^/'/' /'); where 
lYJ^m |.]-^g matrix block of W corresponding to the 
good quantum numbers j,m. Then {S^YWljym, f) = 
?7pj-,m TO + p, /'). Since the reduced den- 

sity matrix does not depend on m, then nor does W. In 
other words, W^'^+p = VF^.™, and so iS+)PW\j,mJ) = 
'r]p,j,mW\j,m+p, /), implying that W\j, to, /) has well de- 
fined flavor. We also proved that and W commute, 
and since applying 5*+ does not change flavor and W does 
not change j or m, then flavors can be assigned in the 
same way to W\j, m, /) as were assigned to |j, m, /). That 
flavors can be assigned without applying 5*+ saves us from 
keeping track of it through the DMRG procedure. 

C. Building and Running DMRG++ 

The requhed software to build DMRG++ is: (i) GNU 
C++, and (ii) the LAPACK library. This library is avail- 
able for most platforms. The configure.pl script will ask for 
the LDFLAGS variable to pass to the compiler/linker. If the 
Linux platform was chosen the default /suggested LDFLAGS 
will include -llapack. If the OSX platform was chosen 
the default/suggested LDFLAGS will include -framiework 
Accelerate. For other platforms the appropriate linker 
flags must be given. More information on LAPACK is here: 
http:/ /nctlib.org/lapack/. 

Optionally, make or gmake is needed to use the Makefile, 
and perl is only needed to run the configure.pl script. 

To Build and run DMRG++: 

cd src 

perl configure.pl 

(please answer questions regarding model, etc) 
make 

. / dmrg input . inp 

The perl script configure.pl will create the files 
main.cpp, Makefile and input. inp. This file can be 
used as input to run the DMRG++ program. To run the 
MPI code the command mpirun ./dmrg input, inp can 
be used, although the actual command will vary according 
to the local MPI Installation. 

There is also a test suite that can be run for all standard 
tests: 

cd TestSuite; . /testsuite .pi — all 

or a specific test can be selected and run by omitting the 
— all argument in the command above. Further details 
can be found in the file README in the code. 
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